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Abstract 

Starting from a sequence regarded as a walk through some set of values, we consider the associated 
loop-erased walk as a sequence of directed edges, with an edge from i to j if the loop erased walk makes 
a step from i to j. We introduce a coloring of these edges by painting edges with a fixed color as long 
as the walk does not loop back on itself, then switching to a new color whenever a loop is erased, with 
each new color distinct from all previous colors. The pattern of colors along the edges of the loop-erased 
walk then displays stretches of consecutive steps of the walk left untouched by the loop-erasure process. 
Assuming that the underlying sequence generating the loop-erased walk is a sequence of independent 
random variables, each uniform on [N] := {1, 2, . . . , TV}, we condition the walk to start at TV and stop 
the walk when it first reaches the subset [k], for some 1 < k < TV — 1. We relate the distribution of the 
random length of this loop-erased walk to the distribution of the length of the first loop of the walk, via 
Cayley's enumerations of trees, and via Wilson's algorithm. For fixed TV and k, and i = 1,2,..., let 73; 
denote the event that the loop-erased walk from TV to [k] has i + 1 or more edges, and the i th and (i + 1) 
of these edges are colored differently. We show that given that the loop-erased random walk has j edges 
for some 1 < j < TV — k, the events Bi for 1 < i < j — 1 are independent, with the probability of Bi 
equal to l/(k + i + 1). This determines the distribution of the sequence of random lengths of differently 
colored segments of the loop-erased walk, and yields asymptotic descriptions of these random lengths as 
TV -> oo. 

1 Introduction 

The loop-erased walk derived from a sequence [X n , n = 0, 1, . . .) is a sequence {C n , n = 0, 1, . . .) of finite 
subsequences of (X n ,n = 0,1,...) defined as follows: let Cq = (Yq,o) = (Xq), and inductively, if X n is 

not in C n -i = (y n _i ( o in-i,i n -i)! then form C n by appending X n to the end of i.e., C n — 

(F„,o, ■ ■ • , Y n>Ln ) with L n = L n -\ + 1, Y n>i = y rl _ M for < i < L n _i, and F„,l„ = X n . On the other hand, 
if X n = for some 1 < j < £„-i, then construct C n by truncating the part of C n -\ beyond Y n -ij, 

i.e., let L n — j, and define C n = (Y n fi, . . . , Y n x n ) by Y n ,% — for < i < j; in this case, we say that a 

loop has occurred. For each n = 0, 1, . . ., we call L n the length of the loop-erased walk at time n, with the 
understanding that if C n is a single point (Xi) for some i, the length is zero. So the length measures the 
number of steps of the path, or the number of edges of the path, with each edge representing some one-step 
transition {X n ,X n+ \) of the original sequence (X n ,n = 0,1,...). See [6] and [7] for equivalent alternative 
definitions of loop-erased walk, and discussions of some basic results on the loop-erasure of random sequences. 

There is a natural way to "color" the loop-erasure of the walk as follows. Assume that we have some 
infinite palette {Ci, C*2, . . .} of colors. Run the walk, and until the first loop occurs, color the edges of the 
walk with the color C% . When the first loop occurs, erase the colored edges as the definition of loop-erasure 
requires, and continue the walk, now coloring the subsequent edges with the color C2 until the next loop 
occurs. More generally, keep coloring the edges of the walk with a fixed color C, until a loop occurs, at which 
point we change to a new fixed color C^+i. 
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Fix a positive integer TV > 2, and let (X n , n = 0, 1, . . .) be an sequence of independent random variables, 
with Xq — N for convenience, and Xi, X2, ■ ■ ■ independent and uniformly distributed on the set [N] := 
{1, 2, . . . , N}. We use this random sequence to construct a loop-erased random walk on [N], following the 
definition for loop-erased walk above. Apart from some delay due to self-loops when X n+ i = X n for some 
n, the sequence of steps of the loop-erased walk is the same as if it were derived from a random walk on 
the complete graph. In particular, we are interested in the random coloring of the loop-erased walk when it 
first reaches the subset [fc] for some 1 < k < N — 1, and this random coloring has the same distribution for 
a sequence of independent random variables (X n , n = 0, 1, . . .) as for a random walk on the complete graph. 
Let Rn denote the first repeat time for Xq,X\, . . ., i.e., the first index i such that Xi € {Xq, . . . , Xi-i}, 
which is the length of the first loop that is erased in the process of loop-erasure. The distribution of Rn is 
determined by the well-known solution of the classical birthday problem, that is, 

HRn>j) = ti^w 1 W 



i=l 



i=l 

Our main result relates the distribution of Rn to the distribution of the length of the loop-erased walk 
stopped when it hits the target set [fc]. 

Let Cn,h be the first time t such that X t G [fc], and note that ^N,k is a geometric random variable 
with parameter k/N. We use the notation X = Y to mean that random variable X and Y have the same 

distribution, and X = (Y\ A) to indicate that the distribution of X is the same as the conditional distribution 
of Y given the event A. Also let 

(a) b := a(a - l)(a - 2) ■ • • (a - b + 1) 
be the usual falling factorial for b = 1, 2, . . . with (a)o = L 

Theorem 1. Let Ajv.fc be the length of the loop-erased random walk derived from Xo — N and a sequence of 
independent variables X\, X2, ■ ■ ■ with uniform distribution on [N], stopped at time ^N,k when the sequence 
first hits [fc], and let Rjy be the first repeat time derived from the same sequence. Then 

Ajv,fc = (Rn - k\RN > fc); 

that is, for every 1 < j ' < N — k. 

H\N,k = J) - P(Rn k = j\R N > fc) = (fc+j)(Ar A ~ fc ~ 1)j ~ 1 (3) 

Moreover, in the colored loop erased walk of length \N_k obtained by stopping at time ^N,k> let B% denote the 
event that the i th and (i + l) th edges of the loop-erased walk are colored differently. Then given Xnm = j, 
the events Bi for 1 < i < j — 1 are independent with 



i + 1 

We prove this result in Section [2l then show in Section [3] how the simple formula ([3]) is closely related 
both to Wilson's loop-erased random walk algorithm to generate spanning trees of a graph, and to Cayley's 
formula for the number of forests with a fixed number of vertices and a fixed set of roots. In Section [U 
we relate Theorem [1] to other basic results about compositions which are closely connected to Aldous's 
Brownian continuum random tree [T] and to stick-breaking schemes. In Section [SJ we discuss some open 
questions which arise naturally from our analysis. 
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2 Proof of Theorem [T] 



Most of the work for this proof is done by the following lemma, where we use the notation of Theorem [TJ 

Lemma 1. Let VN,k denote the number of different colors of segments of the colored loop-erased walk started 
at Xq = N and stopped on first reaching [k], let \N,k be the total length of the loop-erased walk at this 
time, and given A]y,k — j, 1st Yi denote the indicator of the event Bi that the i th and {i + \) th edges in the 
loop-erased random walk from N to [k] are colored differently. Then for each choice of positive integers s 
and j with s < j and each choice of positive integers i±, . . . , i s -i with 

< h < ■ ■ ■ < i s -! < j 



P(^N,k = j,VN,k = s,Y n = l,Y l2 = 1, . . . , Y ls _ 1 = l,Yi = for i e [j - 1] - {i\, ■ ■ . ,i s -i}) 

(fc+i)(jy-fc-i) 3 _! nS -i _J_ (& s 

NJ llr=l k+i r 

where if s = 1, the product is understood to equal 1. 

Proof. Consider first the case when all edges are painted the same color, i.e., VN.k = s = 1, and 1^ = for 
all i. Then loop-erasure can only occur at vertex N, and thus, the random walk takes some number n of 
steps to vertices in [N] — [k], then it hits N one final time, and then it hits (j — 1) distinct vertices excluding 
N and [k] before finally hitting some element of [k]. It thus follows that that 

(fc + 1)(JV- k- l)j_i 
Ni 

as desired. 

Next, consider the case of two colors, with a color change at the h th vertex in the path from TV to [k] for 
some 1 < h < j — 1. Again, the random walk takes n steps outside of [k] before hitting vertex TV for the 
last time, and then the random walk takes h steps without looping, to distinct vertices. At the h th vertex, 
the random walk takes h steps to vertices outside of [k] and the Q , I s *, . . . , (h — l) th vertices before hitting 
the h th vertex one final time, and then it takes j — h steps without looping until it hits something in [k\. So 
again appealing to independence and the uniform distribution of the variables Xi for i > 1 , we see that 

P(Ajv,fc = j, m,k = 2, Y h = 1, Y t = for * G \j - 1] - {h}) 



1 T ^ n= I 1 N) N) \l^m=0 V 1 N 1 N 



(JV-fc-l),--! k 

JV3-1 N 



N J ' k+h 



as desired. 

Extending this argument to three or more colors is straightforward. □ 

Proof of formula f3J). The equality of the second and third expressions is evident from ([1]) and (|2|). To prove 
the equality of the first and third expressions in this equation, we sum up equation ^ over all s between 1 
and N — k and all possible sequences (ii, . . . , i s -i), corresponding to all possible subsets of [j — 1], including 
the empty subset, and to all possible sequences of values of the color change indicators (Yi). Then ([3]) is 
seen to amount to 

/cy-i] iei 

where it is understood that if I is the empty set, then the product equals 1. It is obvious that /(l) = 1; 
suppose inductively that f(j — 1) = fc ^~ 1 . Then using the fact that f(J) = f(J — 1) + fc+ ]_ 1 f{j — 1) yields 
equation ([5]) for any j, and finishes the proof of (J3]) . □ 
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For the second part of the Theorem, we need need a lemma about Bernoulli trials: 

Lemma 2. A sequence Y±,... , Yj -i is an independent sequence of Bernoulli random variables, such that Yj 
has parameter k+ \ +1 , if and only if for each choice of positive integers . ,i s ~i with 

< it < ■ ■ ■ < i s _i < j, 

P(Y h =■■■ = Y is _, = l,Yi = fori G [j - 1] - {h, . . .,t s -i}) = ^ IT T-^- 

k + J ±± ; k + i r 

r—1 

Proof. Suppose that the sequence Y±, . . . ,Yj-% is independent Bernoulli, such that Yj has parameter pi := 

, , . i ., , Then 

k+i+i 

s-l 

P(Y tl = -.. = y i ._ 1 =1,^ = for ie [7-l]-{n,...,i.-i}) = II^ II 

r=l ie[j'-l]-{<i,-,i*-i} 



Using the fact that Pi = p.;-i(l — Pi) for i = 1, ... , j — 1 (where we let po = ^xj) and the fact that 

fc+i 
k+j ■ 



(1 — pi) ••■(!— Pj-i) — we see that this last product becomes 



/.■ • 1 , r /,• • 1 , r 1 



n^n(i-»)=sn^=sn 



r—1 %—l r—1 r—1 

as desired. The converse is obvious by just reversing the sequence of equalities. □ 
Proof of Theorem^ Formula has already been established. From Lemma [T] and ^ we see that 
P(u N>k = s, Y n = 1, Y i2 = 1, . . . , Y is _ x = 1, Yi = for i G [j - 1] - {h, . . . , i s _i}|A w = j) 

fc+i llr=l fc+i r \ u ) 

and now the conclusion follows from Lemma [2j □ 



3 The length of the loop-erased random walk 

3.1 The Markov property of the length of the loop-erasure 

An alternative method of proving formula ([3]) begins with the following observation: if L n denotes the 
length of the loop-erasure of the i.i.d. sequence (Xq, . . . , X n ) of random variables uniform on [N], then 
(L n ,n = 0,1,...) has the same dynamics as a Markov chain with the following transition probabilities, 
started at Lq = 0: 

( l/N 0<j<i 
QN{i,j)=\ (N-i-l)/N 3=i + l (7) 
I otherwise 

In fact, for n = 1, 2, . . ., L n = min(L„_i + 1, X n — 1). Although by definition Lq = throughout this article, 
we could just as well start with some arbitrary loop-less path (and a vertex at the end of the path from which 
to step) whose length Lq is a random variable taking values in {0, 1, . . . , N — 1} and then run loop-erased 
random walk; if we then choose Xq independent of X\, X2, ... so that Xq — 1 = Lq, it follows by induction 
that 

L n = mm (X n _j + j - 1). 
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Using the independence of the (Xi,i = 0, 1, . . .), it follows that 

Q N (i,[m, oo)) = P(£„>m|L = i) 

= P( min (X„_j + j - 1) > m|X = i) 

0<j<n 

= l(i + n — 1 > m)/jv )m _ n+ i/jv,m-n+2 • • • /iV,m (8) 

where for X a random variable uniformly distributed on [N], fpf.m '■= ¥(X > to); note that /jvm = 1 if 
to < and /Ar. m = if m > N. From this, we obtain arbitrary entries of powers of the transition matrix: 

Q%(i,m) = Q N {i, [m,oo)) - Q N (i, [m + l,oo)) 

= /w,m-n+2 ' ' ' /jV,m(l(« + H - 1 > m)f N ,m-n+l -l(i + n- l>TO + l)/w,m+l) (9) 

Moreover, letting n — > oo in ([5]) and using the fact that Xi, X2, . . . are nonnegative random variables, it 
follows that 

lim Q n (i, [m, 00)) = /«i • • • /w m 
n — >oo 

lim Q n (i,m) = f N ,i ■ ■ ■ ftf, m (l - /jv,m+i) 

n — >oo 

where if to < 0, the product ■ • ■ /iv,m is understood to be 1. Note that the first limit is the probability 
P(i?Ar > to), and the second limit is the probability P(Rn = m+1). Thus, applying the convergence theorem 
for irreducible aperiodic Markov chains [3] page 314], we obtain the following result: 

Proposition 1. The stationary distribution of the Markov chain (L„, n = 0, 1, . . .) is the distribution of the 
random variable Rn — 1, where Rn is the index of the first repeat in an i.i.d. sequence of random variables 
uniform on [N] . 

As an aside, the exact same reasoning can be applied to a non-uniform random variable X on the positive 
integers, to obtain the following: 

Corollary 1. Let X be a positive-integer-valued random variable, and define an independent sequence 
Xq,X\, . . . ,, where Xq has some distribution on the positive integers, and Xi,X2, ... is an i.i.d. sequence 
of variables with the same distribution as X . Define a transition matrix Q on the nonnegative integers as 
follows: 

' P(X = j + 1) < j < i 
¥(X > i + 1) j = i + 1 



Q(i,j) = 



Then if L n is the loop-erasure of the path (Xq, . . . , X n ), (L n , n = 0, 1, . . .) has the same transition dynamics 
as a Markov chain with transition matrix Q, and if g m '■= V(X > m), then powers of the transition matrix 
are given by 

(Q) n (i,m) = g m - n +2 ■ • • g m (l(i + n-l> m)g m ^ n+1 -l(i + n-l>m + l)g m+ i). 

Moreover, ifV(X = 1) > 0, then the Markov chain is irreducible and positive recurrent on the nonnegative 
integers, with a stationary distribution determined by either of the following formulas for m = 1,2,...: 

m 

tt([to,oo)) = JJP(X>i) 

i=l 

m 

tt(to) = ¥{X < m + 1)Y[P(X > i) 
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We now use equation © to provide a second proof of formula ([3]). Conditioning on the value of Ov,fc, 
which is a geometric random variable with parameter k/N, we see that for 1 < j < N — k, 

oo 

nX N ,k = j) = P(i Cjv , fc = j) = J2 P ( i »- 1 = 3 - IICjv,* = n)F(( N , k = n) 

n=l 

= £^(o,i-i)(i-!) I (10) 

n=l v 7 

To calculate Q^- fe (0, j — 1), we use equation ©, to see that 



/;v-fe,i • • • /jv-fe,j-i(l - fN-k,j) = P{RN-k =j) if < j < n - 2 
Q£L*(0, j 1) = <J /iv-fc,i • • • /iv-fe,,--! - HRN-k > j - 1) if j = n - 1 ( 1 1 ) 

if j ' > n 



Now returning to equation (|10|) . we see that 



P(A JV ,fc=j) = P(^-*>i-l)(l-^J -+P(i^- fc =j) £ l 1 "*?) TV (12) 

7 n=j+2 ^ 7 



(iV-fe-l)j_i (A^-A:)^ 1 fc j (N-k-l)j-i (N-k)i 

(N-ky- 1 NJ- 1 TV + N -k ' (N-ky- 1 W 

(j + fc)(JV-fc-l) 3 -_ 1 



(13) 
(14) 



as desired. 



3.2 Relation to using Wilson's algorithm 

Yet another derivation of formula © is provided by Wilson's algorithm [115]; to explore this connection, 
we will need to introduce some preliminaries on trees. A rooted tree T = (V,E,r) consists of a vertex set 
V, an edge set E C V x V, and a distinguished vertex r £ V called the root, such that for any non-root 
vertex v £ V, there is a unique directed sequence of edges that leads from v to r, and such that there are 
no undirected loops, i.e., from any vertex there does not exist a sequence of distinct undirected edges which 
leads back to that vertex. 

Wilson's algorithm can make use of the random walk on the complete graph to generate a random tree 
with N vertices labeled by [N] as follows: Let To be the one-point tree with root and vertex labeled 1. 
Suppose that To, ... , T n _i have been defined, with respective vertex sets Vq = {1}, Vi, . . . , V n -\- If T„_i 
has N vertices, stop the algorithm and output T n -\. Otherwise, pick some vertex v £ [N] — V n -i (it does 
not matter how one chooses v), and run a random walk on the complete graph from v until it hits some 
vertex in V n -\- Loop-erase this random walk, and add the loop-erased path to T n _i to form the tree T„, 
still with root labeled 1, and vertex set V n . Also, call this loop-erased path from w to V n —i a macrostep of 
the algorithm. 

According to [10] , the random tree generated by Wilson's algorithm applied to the complete graph with 
A^ vertices is uniformly distributed among all rooted trees labeled by [N] with root 1; call this the uniform 
spanning tree with root 1 (where the word spanning is used to signifiy that the tree has the full set of A^ 
vertices). Suppose that we start Wilson's algorithm at a vertex N. The first macrostep is just the loop-erased 
walk from N to I, and thus the length of this macrostep has the same distribution as Aa^i. On the other hand, 
Wilson's algorithm implies that this macrostep is also the path from A^ to 1 in the uniform spanning tree 
with root 1. If if/v,i is the length of this path, i.e., the number of edges in the path, then Wilson's algorithm 

clearly implies that Hn,i = Xn,i- Moreover, Meir and Moon [8] proved that iTv,i = (Rn ~ 1\Rn > 1), and 
thus Wilson's algorithm coupled with this result in [8] yields an alternative proof of formula ([3]) in the case 



G 



k = 1. In fact, the methods of Wilson's algorithm and Meir and Moon can be applied to a random rooted 
forest labeled by [N] (i.e., a collection of trees with ./V total vertices labeled by [N]) with a fixed set of roots 
labeled by [k], which is uniform among all such rooted forests labeled by [N] with the same root set [k], to 
prove formula for arbitrary k. 

The result of Meir and Moon made use of Cayley's formula for the enumeration of forests. Thus our 
derivation of formula yields a novel proof of Cayley's formula: 

Corollary 2 (Cayley's formula). The number t^,k of forests with N vertices labeled by [N] and k rooted 
trees with root set [k] is given by tjv.fe = kN N ~ k ~ 1 . 

Proof. As mentioned above, applying Wilson's algorithm with root 1, started at some other vertex v, proves 
that Ajv.i — the length of the loop-erased random walk from v to 1 — and Hn,i — the edge-distance between 
v and 1 in a spanning tree labeled by [N] which is uniform among all spanning trees with root r — have the 
same distribution. For any fixed 1 < j < N — 1, we want to count the number of spanning trees with root 
r such that the edge-distance from v to r equals j. We have to choose the j — 1 vertices in the path from v 
to r, and then each vertex on the path (including v and r) may be considered the root of a tree; thus, there 
are (N — 2)j-\tN,j+i such spanning trees. Thus, it follows that 

" + _ p ( A„,, . j) - na„ - J) = " v -^-"^' (15) 

JS J tjv,l 

In the particular case j = N — 1, it is obvious that tjvj+i = 1, so this equality implies that tjf t i = N N ~ 2 . 
But now substituting this into equation (| 15[) yields Cayley's formula for arbitrary j. □ 

Lyons and Peres [7] use very similar reasoning in applying Wilson's algorithm to calculate tjsr t i, by 
computing the probability of a particular tree, namely a path of length N — 1. 



4 Some scaling limit results 

4.1 Connections to Poisson and Rayleigh processes 

See Pittel's paper [9] for a discussion of related results where a similar distribution involving the lengths 
of gaps between l's of an independent Bernoulli sequence arises. Pittel shows that the sequence of lengths 
of macrosteps obtained when applying Wilson's algorithm to the complete graph with N vertices has the 
same distribution as the sequence of spacings between successes of independent Bernoulli(j/iV) variables, 
2 < j < N. For each N — 1,2,..., let (Yjv,jjj = 1, 2, . . .) be such an independent sequence of Bernoulli 
random variables, where for 1 < j < n, Y^.j has parameter j/N, and for j > N, Y/vj = 1. Let Zjv,i be the 
smallest index j such that Yzv,j = 1, let Zn.2 be the second-smallest index j such that Yzv,j = 1, and define 
Zjsi.j similarly. Then for any positive integer m, as N — > oo, 

—f==(ZN,l, ■ ■ ■ , Ziy.m) —* (Pi, ■ * * ) Pm), 

v Jy 

where P% < P2 < ■ ■■ are the successive points of an inhomogeneous Poisson point process on [0, 00) with rate 
t at time i; that is, at every continuity point of the distribution function of (Pi, ■ ■ ■ ,P m ), the distribution 
function of (Zjy t i, . . . , Z N . m ) converges to the distribution function of (P x , P m )- This follows from, e.g., 
results of §2.6]. 

A similar type of scaling limit comes up when analyzing repeat values. Let Rn,i : = Rn be the index of 
first repeat in an i.i.d. sequence of random variables uniform on [N], and let Rn,2 be the second-smallest 
index i such that Xi £ {Xq, . . . , A,_i}, let Rn,3 be the third-smallest such index, and so on. Then (see [2] 
and work cited there) the sequence (Rpf t i,i — 1,2,...) has the same finite-dimensional scaling limits as the 
sequence (Ynj, j = 1, 2, . . .): for any positive integer m, as N — > 00, 

—t=(Rn,i, ■ ■ ■ , Rn,7u) — * (Pi, ■ ■ ■ , Pm)- 
v N 
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This latter scaling limit result is an integral part of one of Aldous's constructions of the Brownian continuum 
random tree [1] . 

But now, using the fact that \N,k = {Rn — k\R n > k) from Theorem [TJ we obtain the following: 



Corollary 3. For a fixed fi > 0, as N — > oo, (X N i^^/jvj )/"vN converges in distribution to (Pi — fi\Pi > fi), 
where Pi is the first point of an inhomogeneous Poisson point process on [0, oo) with rate t at time t. 

This result can also be used to provide an estimate for i>N,k, the number of colors in the loop-erased 
random walk from N to [k]. It is clear that conditional on Aj\r,fc = j, the expected value of fN,k is given by 



E(y N ^ k \\ N . k = j) = 1 + ^ 



1 



k + i+1 



It thus follows that 



E 



1 



But as N — > oo, the term 1 
the term P(A 



2 \j*VN\+j 
approaches 1 + log ( 1 



3) 



and by Corollary [31 



n,IhVn\ = i) approaches 
e ( v n,YhVn\) 



1 + log 
1 — log n + exp 



+ exp (-] 2 /2N - fij/VN 
(/i + x) exp 



Therefore, as N 



oo, 



X 

1 + - 

2 



i log f exp — 



dt 



(16) 
(17) 



after some simplification and the substitution t = x + ji. 

An alternative derivation of this scaling limit comes from the fact that the length of the loop-erased 
random walk increases at unit speed until a length j when a loop occurs, after which its new length is 
uniformly distributed among {0,1,..., j}. As such, it is closely related to the standard Rayleigh process 
(Rt,t > 0) [4], defined as follows: let Ro — 0, and for Pj_i < t < Pi (with the convention that Pq = 0), let 
Rt grow at unit speed; at each time Pi, let Rp i be selected uniformly within the interval (0, i?p ; _). If we 
also make note of the basic fact that (jv.fc, the geometric time at which the walk from N hits [k], satisfies 



the scaling limit Cat.^VIvj/ 
the following is clear: 



N 



X^, where has an exponential distribution with parameter /z, then 



Corollary 4. As N — ► oo, X N i^y^j / V N converges in distribution to Rx^, where (Rt , t > 0) is the standard 
Rayleigh process, and X^ is independent of (Rt,t > 0) and has an exponential distribution with parameter 
H. 

Using the fact from [1] that Rt = P\ A t, where Pi has the standard Rayleigh distribution, shows that 
the scaling limits in Corollaries [3] and [4] have the same distribution. 

Consider a finite fc, condition on the length of the loop-erased random walk from N to [k] equalling j, 
and let C\ < ■ ■ ■ < C, 
Theorem [U it follows that 

P(Ci >i) = ( 1 - 
P(C m >i + j\C m -i =i) 



VN be the color-changing indices, i.e., those indices i in [j — 1] such that Bj, = 1. By 

it 

k 



1 



k + 2 
k + i + 1 
k+i+j+1 



1 - 



1 



1 - 



1 



1 



1 



1 



If we now let k = \jiy N\ , and let TV 
More generally, we see that: 



oo, then it follows that Ci/VN ^ D u where P(D X > A) = 
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Proposition 2. As N — > oo, 

-j={C x ,...,C VNAiiVwr i) (Dx,...,D v{ll) ), 

where D\ < Di < ■ ■ ■ are the points of an inhomogeneous Poisson point process with rate at time t, and 
v{n) := sup{i : A < Rx^}- 

The sequence (D± , . . . , D v {^) ) is called the sequence of ladder indices of the standard Rayleigh process 
up to time if, at a jump time Pi, the Rayleigh process jumps down to some Qi uniformly chosen in 
(0, i?p ; _), then the sequence (D±, . . . , D v ^) is the subsequence (Q^, . ■ . , Qi vM ) of (Q\, Q2, ■ ■ ■) up to time 

such for each index i c , Qi c < Qi c + m for all m = 1, 2, . . .. 

As a check, note that conditional on Rx^ = x, the number of colors in the scaled walk is one plus a 
Poisson random variable with parameter log (1 + x/fj,), and thus the expected value of this number of colors 
is 00 2 

1 + J log ^1 + ^ j (x + fi) exp (^-^ — dx, 

which is the same as derived in equation Q16p . 



4.2 Stick-breaking 

In the previous subsection, letting k — [fi\/N\ led to scaling limit results which were closely related to 
particular Poisson and Rayleigh processes on [0, 00). Fixing k while letting N — * 00 leads to a very different 
result. To explore this, first note that Theorem Q] also relates to some other basic results on random 
compositions. See [SJ §1.5] and [31 pp. 52-53] for discussions of how the record times of a sequence of i.i.d. 
random variables (Wi, i = 1,2,.. .) — i.e., the times j such that Wj > Wi for all 1 < i < j — 1 — is distributed 
like the occurrences of l's in a sequence of independent Bernoulli(^j-) random variables. 

In the setting of Theorem[T] let (o~N,k,i, ■ ■ ■ , o~N.k,v N k ) denote the random composition of Xn.u representing 
the numbers of edges of each color, working along the colored loop-erased path from N to [k], and consider 
the reversed sequence of segment lengths 

( a N,k,l> ■ ■ •i a N,k,v N , k ) : = ( a N,k,VN,k>- ■ ->^N,k,i)- (18) 

Also, suppose that we condition on Ajv.fe = j for some 1 < j < N — k, so that if Ci for 1 < i < j — 1 
is the indicator of whether i is a partial sum of the sequence (cr' N k m , 1 < m < VN.k), then the variables 
(C^, 1 < i < j — 1), conditioned on Ajy,fc = j, are independent Bernoulli( fc+ ) random variables. We 
know from the proof of Theorem [1] that ^(cr^v ki = j) = I+J ( smce this corresponds to there being only one 
color in the loop-erased walk); however, because of Theorem [1] we now also see that for 1 < n < j — 1, 

n<4,M =n)=(l- ^-j-^) - k + )_^j ■ • • (l - k + J l 2 -n) k + j + l-n = kT] 

so that a' N k t is equally likely to be any of 1, 2, . . . , j — 1. 

It is easy to generalize this, to see that conditional on (a' N kl , . . . , a' N k m ) = (fci, . . . , k m ) with c := 

ki-\ h k m < j, 

P(^W+i = i- c ) = fc + j-c (19) 

n^N.k, m +i = Wi) = 7—^ , 1 < k m +i <j-c-l (20) 

Now since \N.k — > 00 in probability as N — > 00, we see that as A — > 00, 

A N y^, fc ,i, • ■ • , o-' N>ktVN k ) (f/x, (1 - t/i)t/ 2 , (1 - f/i)(l - C/ 2 )f/3, . • (21) 

where the (Ui, i — 1,2,...) are independent uniform(0, 1) random variables. The right hand side of (j2"Tj) is 
known as the continuous uniform stickbreaking process defined by the (Ui, i — 1, 2, . . .). 
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4.3 Markovian and non-Markovian properties of the colored walk 

Although the sequence of lengths L n of the loop-erased walk is a Markov chain, we will now argue that the 
sequence of compositions of L n defined by the lengths of colored segments of the loop-erased walk is not a 
Markov chain. 

Recall that a composition of a positive integer £ is a sequence of positive integers with sum £. Here 
the terms of the sequence represent the lengths of stretches of edges of the same color in a loop-erased 
walk with total length £. Such a composition of I is conveniently encoded by the string of £ binary bits 
(j/i, ... , yi) where yi = 1 and yi is the indicator of a color change between the (i — l) th and i th edges. So 
the number of compositions of £ is 2 i ~ 1 . If y = (y\, . . . , yi) and z — (z\, . . . , z m ) are compositions of £ and 
to respectively call y a truncation of z if £ < to, yi — Zi for 1 < i < I — 1, and yi < zt. We introduce also 
a trivial composition, corresponding to a sequence with no terms, which is regarded as a truncation of every 
composition of a positive integer. 

For n — 0, 1, . . ., let C„ denote the composition induced by coloring the segments of the loop-erasure of 
(X , . . . , X n ), where it is understood that if the loop-erasure has one vertex and no edges, then C„ is the 
trivial composition. Observe that X n+ i belongs to the set of values in the loop-erasure of (Xq, . . . ,X n ) if 
and only if C„+i is a truncation of C' n . The sequence (C n , n = 0, 1, . . .) has the following dynamics: 

• If C n is the trivial composition, then C„+i is the trivial composition with probability 1/N, and C„+i = 
(1) with probability 1-l/N. 

• If C n is some non-trivial composition c of £, then C n is either some truncation of C„_i or an extension 
of C n _i by one term, according to whether or not X n creates a loop: 

- if X n creates a loop, then C n +i extends C„ by adding the bit 1 with probability (N — £ — l)/N, 
while C n +i is equally likely to be each of the £ + 1 possible truncations of C n \ 

— if X n does not create a loop, then C„+i extends C n by adding the bit with probability (N — 
£ — l)/N, while C n+ \ is equally likely to be each of the £ + 1 possible truncations of C„; 

The sequence (C n ,n = 0, 1, . . .) is not a Markov chain because the transition dynamics from C n to C n +i 
depend on whether X n created a loop, which is determined by the relationship between C„_i and C n . 
However, this analysis does show that the sequence of pairs of compositions, ((C„, C„ + i), n = Q, 1, . . .) is a 
Markov chain. 

5 Open questions 

This article has examined loop-erased random walk which stops when we reach some marked subset of [N] 
of size k, at which point the walk has a random length \N,k', in this case, we showed that conditional on 
the value of \N,k, the composition of color segments was distributed as the length of spacings between l's 
of a sequence of Bernoulli random variables. What if we stop the walk when the length of the loop-erasure 
reaches to, for some 1 < m < N — kl Will the resulting composition of colors have a similar distribution, 
as the lengths of spacings? How will the composition be distributed if we just stop at some fixed finite time 
to, and look at the colored loop-erasure of (Xq, . . . , X m )l The situation appears to be like the Ray-Knight 
description of the distribution of Brownian local times, where results are much simpler for suitable stopping 
times than for fixed times. 

In Section |H we derived a number of results which hold at the end of the stopped walk. But in this 
situation too, what happens at an intermediate stage? In the case of k — [/iy/N\ , it seems that the length of 
the loop-erased random walk, considered as a stochastic process, should converge to the standard Rayleigh 
process. 

Finally, it seems that there is some sort of "critical" behavior when k is of order of magnitude \/N, as 
detailed in Section |4j What happens to the colored walk when k = o(^/N) or k = lu(^/N)7 In the former 
case, the walk does not get stopped by an exponential time, and it appears that the number of colors of the 
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stopped walk should increase as log N. In the latter case, in some sense the walk gets stopped before it can 
make loops, and it appears that the number of colors should converge to 1 almost surely. 
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